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Abstract 

The Hierarchical Heavy Hitters problem extends the notion of frequent items to data ar- 
ranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly 
detection, and DDoS detection. Wc present a new streaming approximation algorithm for com- 
puting Hierarchical Heavy Hitters that has several advantages over previous algorithms. It 
improves on the worst-case time and space bounds of earlier algorithms, is conceptually simple 
and substantially easier to implement, offers improved accuracy guarantees, is easily adopted 
to a distributed or parallel setting, and can be efficiently implemented in commodity hardware 
such as ternary content addressable memory (TCAMs). We present experimental results show- 
ing that for parameters of primary practical interest, our two-dimensional algorithm is superior 
to existing algorithms in terms of speed and accuracy, and competitive in terms of space, while 
our one-dimensional algorithm is also superior in terms of speed and accuracy for a more limited 
range of parameters. 

1 Introduction 

Finding heavy hitters, or frequent items, is a fundamental problem in the data streaming paradigm. 
As a practical motivation, network managers often wish to determine which IP addresses are sending 
or receiving the most traffic, in order to detect anomalous activity or optimize performance. Often, 
the large volume of network traffic makes it infeasible to store the relevant data in memory. Instead, 
we can use a streaming algorithm to compute (approximate) statistics in real time given sequential 
access to the data and using space sublinear in both the universe size and stream length. 

We present and analyze a streaming approximation algorithm for a generalization of the Heavy 
Hitters problem, known as Hierarchical Heavy Hitters (HHHs) . The definition of HHHs is motivated 
by the observation that some data are naturally hierarchical, and ignoring this when tracking 
frequent items may mean the loss of useful information. Returning to our example of IP addresses, 
suppose that a single entity controls all IP addresses of the subnet 021.132.145.*, where * is a 
wildcard byte. It is possible for the controlling entity to spread out traffic uniformly among this 
set of IP addresses, so that no single IP address within the set of addresses 021.132.145.* is a 
heavy hitter. Nonetheless, a network manager may want to know if the sum of the traffic of all IP 
addresses in the subnet exceeds a specified threshold. 

One can expand the concept further to consider multidimensional hierarchical data. For exam- 
ple, one might track traffic between source-destination pairs of IP addresses at the router level. In 
that case, the network manager may want to know if there is a Heavy Hitter for network traffic 
at the level of two IP addresses, between a source IP address and a destination subnet, between a 
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Figure 1: Sample execution of Space Saving with 3 counters. Each counter tracks an item (denoted 
by a letter), and the estimated frequency of that item. The smallest counter is boldfaced and 
italicized. 



source subnet and a destination IP address, or between two subnets. This motivates the study of 
the two-dimensional HHH problem. 

There is some subtlety in the appropriate definitions, as it makes sense to require that an 
element is not marked as an HHH simply because it has a significant descendant, but because the 
aggregation of its children makes it significant; otherwise, the algorithm returns redundant, less 
helpful information. We present the definitions shortly, following previous work that has explored 
HHHs for both one-dimensional and multi-dimensional hierarchies [6l [T2l [71 [H \TT\ \TE[ [23] . 

HHHs have many applications, and have been central to proposals for real-time anomaly detec- 
tion [25] and DDos detection [22j . While IP addresses serve as our motivating example throughout 
the paper, our algorithm applies to arbitrary hierarchical data such as geographic or temporal data. 
We demonstrate that our algorithm has several advantages, combining improved worst-case time 
and space bounds with more practical advantages such as simplicity, parallelizability, and superior 
performance on real-world data. 

Our algorithm utilizes the Space Saving algorithm, proposed by Metwally et al. [18j, as a 
subroutine. Space Saving is a counter-based algorithm for estimating item frequencies, meaning 
the algorithm tracks a subset of items from the universe, maintaining an approximate count for 
each item in the subset. Specifically, the algorithm input is a stream of pairs (z, c) where i is an 
item and c > is a frequency increment for that item. At each time step the algorithm tracks a 
set T of items, each with a counter. If the next item i in the stream is in T, its counter is updated 
appropriately. Otherwise, the item with the smallest counter in T is removed and replaced by 
i, and the counter for i is set to the counter value of the item replaced, plus c. This approach 
for replacing items in the set may seem counterintuitive, as the item i may have an exaggerated 
count after placement, but the result is that if T is large enough, all Heavy Hitters will appear in 
the final set. Indeed, Space Saving has recently been identified as the most accurate and efficient 
algorithm in practice for computing Heavy Hitters [5J, and, as we later discuss, it also possesses 
strong theoretical error guarantees [2|. 

1.1 Related Work 

We require some notation to introduce prior related work and our contributions; this notation is 
more formally defined in Section [2] In what follows, N is the sum of all frequencies of items in the 
stream, e is an accuracy parameter so that all outputs are within eA'^ of their true count, and H 
represents the size of the hierarchy (specifically, the size of the underlying lattice) the data belongs 
to. Unitary updates refer to systems where the count for an item increases by only 1 on each step. 
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or equivalently, where we just count item appearances. 

The one-dimensional HHH problem was first defined in [6j , which also gave the first streaming al- 
gorithms for it. Several possible definitions and corresponding algorithms for the multi-dimensional 
problem were introduced in [TJ [8] . The definition we use here is the most natural, and was con- 
sidered in several subsequent works [12l[23]. In terms of practical applications, multi-dimensional 
HHHs were used in [TT] to find patterns of traffic termed "compressed traffic clusters" , in [25j 
for real-time anomaly detection, and in ^22j for DDoS detection. 

The Space Saving algorithm was used in [15] in algorithms for the one-dimensional HHH prob- 
lem. Their algorithms require 0{H'^/e) space, while our algorithm requires 0{H/e) space. Very 
recently, [23] presented an algorithm for the two-dimensional HHH problem, requiring 
space. 

Other recent work studies the HHH problem with a focus on developing algorithms well-suited to 
commodity hardware such as ternary content-addressable memories (TCAMs) [13]. Our algorithms 
are also well-suited to commodity hardware, as we describe in Section [5} The primary difference 
between the present work and is that the algorithms of [13] reduce overhead by only updating 
rules periodically, rather than on a per-packet basis. This leads to lightweight algorithms with no 
provable accuracy guarantees. However, simulation results in [T3] suggest these algorithms perform 
well in practice. In contrast, our algorithms possess very strong accuracy guarantees, but likely 
result in more overhead than the approach of [13j. Which approach is preferable may depend on 
the setting and on the constraints of the data owner. 

1.2 Our Contributions 

In solving the Approximate HHH problem, there are three metrics that we seek to optimize: the 
time and space required to process each update and to output the list of approximate HHHs and 
their estimated frequencies and the quality of the output, in terms of the number of prefixes in the 
final output and the accuracy of the estimates. Our approach has several advantages over previous 
work. 

1. The worst-case space bound of our algorithm is 0{H/e). Notice this does not depend on the 
sum of the item frequencies, N , as H depends only on the size of the underlying hierarchy and 
is independent of A^. This beats the worst-case space bound of O (f log eA^) from [7] and [8], 
the 0{H^/€) bound for the one-dimensional algorithm of (15j . and the 0{H^^'^ /e) bound for 
the two-dimensional algorithm of |23j. Additionally our algorithm provably requires o{H/e) 
space under realistic assumptions on the frequency distribution of the stream. 

2. The worst-case time bound for our algorithm per insertion operation is 0{H log ^) in the case 
of arbitrary updates and 0(H) in the case of unitary updates. Again this does not depend 
on A^. Previous time bounds per insert were 0{H log eN) in [HI El IE]- 

3. We obtain a refined analysis of error propagation to achieve better accuracy guarantees and 
provide non-trivial bounds on the number of HHHs output by our algorithm in one and two 
dimensions. These bounds were not provided for the algorithms in [6l[7l[8]. 

4. The space usage of our algorithm can be fixed a priori, independent of the sum of frequencies 
A^, as it only depends on the number of counters maintained by each instance of Space Saving, 
which we set at ^ in the absence of assumptions about the data distribution. In contrast, the 
space usage of the algorithms of [7] and [8] depends on the input stream, and these algorithms 
dynamically add and prune counters over the course of execution, which can be infeasible in 
realistic settings. 
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5. Our algorithm is conceptually simpler and substantially easier to implement than previous 
algorithms. We firmly believe programmer time should be viewed as a resource similar to 
running time and space. We were able to use an off-the-shelf implementation of Space Sav- 
ing, but this fact notwithstanding, we still spent roughly an order of magnitude less time 
implementing our algorithms, compared to those from [318]. 

6. Our algorithms extend easily to more restricted settings. For example, we describe in Section 
[5] how to efficiently implement our algorithms using TCAMs, how to parallelize them, how to 
apply them to distributed data streams, and how to handle sliding windows or streams with 
deletions. 

We present experimental results showing that for parameters of primary practical interest, our 
two-dimensional algorithm is superior to existing algorithms in terms of speed and accuracy, and 
competitive in terms of space, while our one-dimensional algorithm is also superior in terms of 
speed and accuracy for a more limited range of parameters. In short, we believe our algorithm 
offers a significantly better combination of simplicity and efficiency than any existing algorithm. 

2 Notation, Definitions, and Setup 
2.1 Notation and Definitions 

As mentioned above, the theoretical framework developed in this section was described in [8], and 
considered in several subsequent works [12l [23] . 

In examples throughout this paper, we consider the IP address hierarchy at bytewise granular- 
ity: for example, the generalization of 021.132.145.146 by one byte is 021.132.145.*, by two bytes 
is 021.132.*.*, by three bytes is 021.*.*.*, and by four bytes is *.*.*.*. In two dimensions, we 
consider pairs of IP addresses, corresponding to source and destination IPs. Each IP prefix that 
is not fully general in either dimensions has two parents. For example, the two parents of the IP 
pair (021.132.145.146, 123.122.121.120) are (021.132.145.*, 123.122.121.120) and (021.132.145.146, 
123.122.121.*). 

In general, let the dimension of our data be d, and the height of the hierarchy in the i'th 
dimension be hi. In the case of pairs of IP addresses, d = 2 and /ii = /i2 = 4. Denote by par(e, i) 
the generalization of element e on dimension i; for example, if 

e = (021.132.145.*, 123.122.121.120) 

then par(e, 1) = (021.132. * .*, 123.122.121.120) and par(e, 2) = (021.132.145.*, 123.122.121.*). De- 
note the generalization relation by -<] for example, 

(021.132.145.*, 123.122.121.120) -< (021.132. * .*, 123.122. * .*). 

Define p < q hy {p ^ q) y {p = q). The generalization relation defines a lattice structure in 
the obvious manner. We overload our notation to define the sublattice of a set of elements P as 
(e ^ P) <;=^ 3p G P such that e ^ P. Let H denote the total number of nodes in the lattice: 
H = Utl{h^ + l). 

We call an element fully specified, if it is not the generalization of any other element, e.g. 
021.132.145.163 is fully specified. We call an element fully general in dimension i if par(e,z) does 
not exist. We refer to the unique element that is fully general in all dimensions as the root. For 
ease of reference, we label each element in the lattice with a vector of length d, whose z'th entry 
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is at most hi, to indicate which lattice node the element belongs to, with the vector corresponding 
to each fully specified element having i'th entry equal to hi, and the vector corresponding to the 
root having all entries equal to 0. For example, the element (021.132.145.*, 123.122.121.120) is 
assigned vector (3,4), and (021. 123.122.121.*) is assigned vector (1,3). We define Level(z) 

of the lattice to be the set of labels for which the sum of the entries in the label equals i. We 
overload terminology and refer to an element p as a member of Level(z) if the label assigned to p 
is in Level(i). Let L = Yli=i denote the deepest level in the hierarchy, that of the fully specified 
elements. 

Definition 2.1. (Heavy Hitters) Given a multiset S of size N and a threshold 4>, a Heavy Hitter 
(HH) is an element whose frequency in S is no smaller than (pN. Let /(e) denote the frequency of 
each element e in S. The set of heavy hitters is HH = {e : /(e) > (pN}. 

From here on, we assume we are given a multiset S of (fully-specified) elements from a (possibly 
multidimensional) hierarchical domain of depth L, and a threshold (p. 

Definition 2.2. (Unconditioned count) Given a prefix p, define the unconditioned count of p as 

The exact HHHs are defined inductively as the set of prefixes whose conditioned count exceeds (pN , 
where the conditioned count is the sum of all descendant nodes that are neither HHHs themselves 
nor the descendant of an HHH. Formally: 

Definition 2.3. (Exact HHHs) The set of exact Hierarchical Heavy Hitters are defined inductively. 

1. T-LHHl, the hierarchical heavy hitters at level L, are the heavy hitters of S, that is the fully 
specified elements whose frequencies exceed (pN . 

2. Given a prefixp from Level{l), < I < L, define HJ-iH^j^^ to he the set {h G T-LHT-LiJ^i /\h -< p} 
i.e. H'H'H^^i is the set of descendants of p that have been identified as HHHs. Define the 
conditioned count ofp to be Fp = J2{eeS)A{e-<p)A(e:^'H'H'Hf ) /(^)- HWHi is defined as 

UnUi = nnUi+i U {p : (p G Level{l) A (Fp > ^N)}. 

3. The set of exact Hierarchical Heavy Hitters %%% is defined as the set HHT-Lq. 

Figure [2] displays the exact HHHs for a two-dimensional hierarchy defined over an example 
stream. 

Finding the set of hierarchical heavy hitters and estimating their frequencies requires linear 
space to solve exactly, which is prohibitive. Indeed, even finding the set of heavy hitters requires 
linear space [20], and the hierarchical problem is even more general. For this reason, we study the 
approximate HHH problem. 

Definition 2.4. (Approximate HHHs) Given parameter e, the Approximate Hierarchical Heavy 
Hitters problem with threshold (p is to output a set of items P from the lattice, and lower and upper 
bounds fmin{p) and fmax{p), such that they satisfy two properties, as follows. 

1. Accuracy. fmin{p) < fip) < fmax{p), and fmax{p) - fmin{p) < eN for all pe p. 

2. Goverage. For all prefixes p, define Pp to be the set {q £ P : q ~< p}. Define the conditioned 
count of p with respect to P to be Fp = X](eeS')A(e^p)A(e;^p ) /(^)- require for all prefixes 
piP,Fp<cPN. 
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s=a.b.c.d 
d=w.x.y.z 
10 10 

Figure 2: Example depicting exact HHHs for a two-dimensional stream of IP addresses at byte- 
wise granularity, using the threshold (j)N = 10. The exact HHHs consist of ordered pairs of 
source-destination IP-address prefixes (s denotes source and d denotes destination). Uncondi- 
tioned counts of each HHH are on the left, and conditioned counts for each HHH are on the right. 
The stream consists of ten repetitions of the item (a.b.c.d,w.x.y.z), followed by one instance each 
of items {a.b.c.i,w.x.y.i), {a.b.i.d,w.x.y.i), and {a.b.c.i,w.i.y.z) for all i in the range to 9. Here 
a,b,c,d,w,x,y, and z represent some distinct integers between 10 and 255. 



Intuitively, the Approximate HHH problem requires outputting a set P such that no prefix 
with large conditioned count (with respect to P) is omitted, along with accurate estimates for the 
unconditioned counts of prefixes in P. One might consider it natural to require accurate estimates 
of the conditioned counts of each p G P as well, but as shown in [12], 0.(1 / 4>'^~^^) space would be 
necessary if we required equally accurate estimates for the conditioned counts, and this can be 
excessively large in practice. 



2.2 Our Algorithm, Sketched 

Our algorithm utilizes the Space Saving algorithm, proposed by Metwally et al. [18] as a subroutine, 
so we briefly describe it and some of its relevant properties. As mentioned. Space Saving takes as 
input a stream of pairs (i, c), where i is an item and c > is a frequency increment for that item. 
It tracks a small subset T of items from the stream with a counter for each i €z T. If the next item 
i in the stream is in T, its counter is updated appropriately. Otherwise, the item with the smallest 
counter in T is removed and replaced by i, and the counter for i is set to the counter value of the 
item replaced, plus c. We now describe guarantees of Space Saving from p]. 

Let be the sum of all frequencies of items in the stream, let m be the number of counters 
maintained by Space Saving, and for any j < m, let N^'^^^^^ denote the sum of all but the top j 
frequencies. Berinde et al. [2] showed that for any j < m, 
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f{i)-f{i) < r, (1) 

m — J 
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where f{i) and f(i) are the estimated and true frequencies of item i, respectively. By setting j = 0, 
this implies that \fi — fi\ < ^, so only ^ counters are needed to ensure error at most eA^ in any 
estimated frequency. For frequency distributions whose "tails" fall off sufficiently quickly, Space 
Saving provably requires o(^) space to ensure error at most eA^ (see [2] for more details). 

Using a suitable min-heap based implementation of Space Saving, insertions take O(logm) 
time, and lookups require 0(1) time under arbitrary positive counter updates. When all updates 
are unitary (of the form c = 1), both insertions and lookups can be processed in 0(1) time using 
the Stream Summary data structure [T8] . 

Our algorithm for HHH problems is conceptually simple: it keeps one instance of a Heavy Hitter 
algorithm at each node in the lattice, and for every update e we compute all generalizations of e and 
insert each one separately into a different Heavy Hitter data structure. When determining which 
prefixes to output as approximate HHHs, we start at the bottom level of the lattice and work towards 
the top, using the inclusion-exclusion principle to obtain estimates for the conditioned counts of 
each prefix. We output any prefix whose estimated conditioned count exceeds the threshold (f)N. 

We mention that the ideas underlying our algorithm have been implicit in earlier work on 
HHHs, but have apparently been considered impractical or otherwise inferior to more complicated 
approaches. Notably, [S] briefly proposes an algorithm similar to ours based on sketches. Their 
algorithm can handle deletions as well as insertions, but it requires more space and has significantly 
less efficient output and insertion procedures. Significantly, this algorithm is only mentioned in [8] 
as an extension, and is not studied experimentally. An algorithm similar to ours is also briefly 
described in |12j to show the asymptotic tightness of a lower bound argument. Interestingly, they 
clearly state their algorithm is not meant to be practical. Finally, [6j describes a procedure similar 
to our one-dimensional algorithm, but concludes that it is both slower and less space efficient than 
other algorithms. We therefore consider one of our primary contributions to be the identification 
of our approach as not only practical, but in fact superior in many respects to previous more 
complicated approaches. 

We chose the Space Saving algorithm [18j as our Heavy Hitter algorithm. In contrast, the 
algorithms of [TJ [H] are conceptually based on the Lossy Counting Heavy Hitter algorithm |T7j . A 
number of the advantages enjoyed by our algorithm can be traced directly to our choice of Space 
Saving over Lossy Counting, but not all. For example, the one-dimensional HHH algorithm of |15j 
is also based on Space Saving, yet our algorithm has better space guarantees. 

3 One-Dimensional Hierarchies 

We now provide pseudocode for our algorithm in the one-dimensional case, which is much simpler 
than the case of arbitrary dimension. As discussed, we use the Space Saving algorithm at each node 
of the hierarchy, updating all appropriate nodes for each stream element, and then conservatively 
estimate conditioned counts to determine an appropriate output set. 

InitializeHHH() 

1 Initialize an instance SS{n) of Space Saving with 
counters at each node n of the hierarchy. 
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InsertHHH (element e, count c) 

1 /*Line 4 tells the n'th instance of Space Saving 
to process c insertions of prefix p*/ 

2 for all p such that e <p 

3 Let n be the lattice node that p belongs to 

4 UpdateSS(55(n), p, c) 

OuTPUTHHHlD(threshold 

1 /* par(e) is parent of e*/ 

2 Let Se = for all e 

3 /*Se conservatively estimates the difference 
between unconditioned and conditioned counts of e*/ 

4 for each e in postorder 

5 (/min(e),/max(e)) = GetEstimateSS(55(n) , e) 

6 if /max(e) - Se>(l)N 

7 print(e, /min(e), /max(e)) 

8 '5par(e)"l" = /min(e) 

9 else Spar(e)~l" ~ '^e 

Figure [3] illustrates an execution of our one-dimensional algorithm on a stream of IP addresses 
at byte-wise granularity. 
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Figure 3: Example depicting our one-dimensional algorithm on a stream of IP addresses at byte- 
wise granularity, where each instance of Space Saving maintains 3 counters. The top grid depicts 
the state at time T, and the bottom grid depicts the state at time T+l, after processing the update 
{w.x.y.z, +3). The minimum counter for each instance of Space Saving is boldfaced and italicized. 
If OutputHHHlD is run at time T + l with threshold (j)N = 12, the approximate HHHs output 
would be h.i.m.n, w.x.y.*, h.i.j.*, a.b.c.*, and q.r. * .*. 

The following lemma is useful in proving that our one- dimensional algorithm satisfies various 
nice properties. 

Lemma 3.1. Define Hp C P as the set {h : h ^ P, h ^ p, $h' £ P : h ^ h' ^ p}. Then in one 
dimension, Fp = f{p) - Y^heHp /(^)- 
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Proof. By Definition 2.4 



Fp= E fie) = f{p)- E /(e). 

(ee5)A(e^p)A(e^Pp) (ee5)A(e^Pp) 

Since the hierarchy is one-dimensional, for each e €z S such that e ^ Pp, there is exactly one h S Hp 
such that e ^ h (otherwise, there would he h ^ h' in Hp such that /i -< /i'). Thus, 

/(p) - E /(^) = /(^') - E E /(^) 

(eeS)A(e^Pp) ?te-f/p (ee5)A(e^?t) 

= f{p) - E /(^)- 

h€Hp 

□ 

Theorem 3.2. f/sm^ 0{-) space, our one- dimensional algorithm satisfies the Accuracy and Cov- 
erage requirements of Definition 2 



Proof. By Equation [l| each instance of Space Saving requires ^ counters, corresponding to O(^) 
space, in order to estimate the unconditioned frequency of each item assigned to it within additive 
error eA^. Consequently, the Accuracy requirement is satisfied using 0{^) space in total. 

To prove coverage, we first show by induction that Sp = Ylh&Hp /min(^)- This is true at level L 
because in this case Sp = and Hp is empty. Suppose the claim is true for all prefixes at level k. 
Then for p at level k — 1, 



Sp= ^ /min(g) + E '^9 

i3echild(p)A(jeP (}echild(p)A(j^P 

= ^ /mm(g) + E E /mm(/l) 

gechild(p)AgeP (3echild(p)Ag^P heHq 

= ^ /min(/l), 



heH, 



p 



where the first equality holds by inspection of Lines 5-9 of the output procedure, and the second 
equality holds by the inductive hypothesis. This completes the induction. 



By Lemma 3.1, Fp = f{p) - Y^heH^ 



< /max(p) - ^ /min(/i) = /max(p) " Sp, 

heHp 

where the inequality holds by the Accuracy guarantees. Coverage follows, since our algorithm is 
conservative. That is, if item p is not output, then from Line 6 of the output procedure we have 
/max(p) - Sp< (pN, and we've shown Fp < /max(p) - Sp. □ 

We remark that under realistic assumptions on the data distribution, our algorithm satisfies the 
Accuracy and Coverage requirements using space o{^). Specifically, [21 Theorem 8] shows that, if 
the tail of the frequency distribution (i.e. the quantity N^'^^^''^ for a certain value of k) is bounded 
by that of the Zipfian distribution with parameter a, then Space Saving requires space 0(e~a) to 
estimate all frequencies within error eA^. Notice that if the frequency distribution of the stream 
itself satisfies this "bounded-tail" condition, then the frequency distributions at higher levels of the 
hierarchy do as well. Hence our algorithm requires only space 0{He~'^ ) if the tail of the stream is 
bounded by that of a Zipfian distribution with parameter a. 
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Theorem 3.3. Our one- dimensional algorithm performs each update operation in time 0{H log ^) 
in the case of arbitrary updates, and 0{H) time in the case of unitary updates. Each output 
operation takes time 0{^). 

Proof. The time bound on insertions is trivial, as an insertion operation requires updating H 
instances of Space Saving. Each update of Space Saving using a min-heap based implementation 
for arbitrary updates requires time O(logm), where m = is the number of counters maintained 
by each instance of Space Saving. For unitary updates, each insertion to Space Saving can be 
processed in 0(1) time using the Stream Summary data structure |18j . 

To obtain the time bound on output operations, notice that although the pseudocode for pro- 
cedure OutputHHHlD indicates that we iterate through every possible prefix e, we actually need 
only iterate over those e tracked by the instance of Space Saving corresponding to e's label. We 
may restrict our search to these e because, for any prefix e not tracked by the corresponding Space 
Saving instance, /max(e) < eN < 4>N so e cannot be an approximate HHH. There are at most ^ 
such e's because each of the H instances of Space Saving maintains only ^ counters, and for each e, 
the GetEstimateSS call in line 5 and all operations in lines 6-9 require 0(1) time. The time bound 
follows. □ 

For all prefixes p in the lattice, define the estimated conditioned count of p to be F'^ := /max(p) — 
Sp. By performing a refined analysis of error propagation, we can bound the number of HHHs 
output by our one-dimensional algorithm, and use this result to provide Accuracy guarantees on 
the estimated conditioned counts. 

Theorem 3.4. Let e < ^. The total number of approximate HHHs output by our one- dimensional 
algorithm is at most -;pj2i- Moreover, the maximum error in the approximate conditioned counts, 
F' — Fri, is at most , ^r, eN. 

Proof. We first sketch why not too many approximate HHHs are output. A prefix p is output if 
and only if F^ > (f)N , and F'^ > Fp. The key observation is that for each approximate HHH h & P 
output by our algorithm, h "contributes" error at most eN to the estimated conditioned count F^ 
of at most one ancestor p £ P oi h. Therefore, the total error in the approximate conditioned 
counts of the output set P is small. Consequently, the sum of the true conditioned counts Fp of all 
p € P is very close to ((>N\P\, implying that |P| cannot be much larger than ^ since the stream 
has length A^. 



We make this argument precise. We showed in proving Theorem 3.2 that for all p, Sp 
Ehe/f„/min(/i), so 



Fp = /max(p) - Sp = /max(p) - ^ fmm{h). (2) 

heH„ 



Combining Lemma 3.1 and Equation [2| we see that 



f;-Fp = (/^ax(p) - /--(^)) - (/(p) - E /(^)) = 

heHp h£Hp 

(/max(p) - f{p)) + ( Yl /(^) - fmUh)). 
heHp 

To show that the sum of the true conditioned counts Fp of all p G P is very close to (j)N\P\, we 

use 

j:Pp = Y.K-YiK-Fp) 

peP pGP P&P 
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> - (/max(p) - fip)) - E ( E ^(^) - /mm(/i))- 

peP peP heHp 

By the Accuracy guarantees, Y^p^p (/max(p)-/(p)) is at most |P|eiV. To bound X]pep ( YheHp fi^)' 
/mm(/i)); we observe that for any item h G P, h G Hp for at most one ancestor p £ P (because in 
one dimension, if h ~< p and h ~< p' for distinct p, p' S P, then either p ~< p' or p' < p, contradicting 
the fact that h G Hp and h £ Hp'). Combining this fact with the Accuracy guarantees, we again 
obtain an upper bound of |P|eA^. In summary, we have shown that 

E^P> \P\(I)N - 2e\P\N = \P\{(j)-2e)N. 

P&P 

Since the total length of the stream is A^, and in one dimension each fully specified item contributes 
its count to Fp for at most one p, it follows that 'Ylip&p-^p — ^ hence \P\ < -^p^ claimed. 

Lastly, we bound the maximum error Fp — Fp in any estimated conditioned count. We showed 
that 

F'p-Fp= - fiP)) + ( E ^(^) - /min(/i)), 

heHp 

which, by the Accuracy guarantees, is at most eN + \Hp\eN < eN + (|P| — l)eA < -^^^^N , as 
claimed. □ 



The upper bound on output size provided in Theorem 3.4 is very nearly tight, as there may be 
^ exact heavy hitters. For example, with realistic values oi (j) = .01 and e = .001, Theorem 3 yields 
an upper bound of 102. 



4 Two- Dimensional Hierarchies 

In moving from one to multiple dimensions, only the output procedure must change. In one 
dimension, discounting items that were already output as HHHs was simple. There was no double- 
counting involved, since no two children of an item p had common descendants. To deal with the 
double-counting, we use the principle of inclusion-exclusion in a manner similar to j8j and |7j. 

At a high level, our two-dimensional output procedure works as follows. As before, we start 
at the bottom of the lattice, and compute HHHs one level at a time. For any node p, we have to 
estimate the conditioned count for p by discounting the counts of items that are already output 



as HHHs. However, Lemma 3.1 no longer holds: it is not necessarily true that Fp = /max(p) — 
TliqeHp fil) more dimensions, because for fully specified items that have two or more 

ancestors Hp, we have subtracted their count multiple times. Our algorithm compensates by 
adding these counts back into the sum. 

Before formally presenting our two-dimensional algorithm, we need the following theorem. Let 
glb(/i, h') denote the greatest lower bound of h and h' , that is, the unique common descendant q of 
h and h' satisfying Vp : {q ^ p) f\ {p ^ h) f\ {p < h') =^ P = q- In the case where h and h' have no 
common descendants, the we treat glb(/i, h') as the "trivial item" which has count 0. 

Theorem 4.1. In two dimensions, let Tp be the set of all q expressible as the greatest lower bound 
of two distinct elements of Hp, but not of 3 or more distinct elements in Hp. Then 

Fp = f{p)-Y.f^^) + Y.f^^)- 

q&Hp q&Tp 
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The proof appears in Appendix [X} 

Below, we give pseudocode for our two-dimensional output procedure. We compute estimated 
conditioned counts = /max(p) - YlhieHp fmm{hi) + E<^eTp /max(g)- As in the one-dimensional 
case, the Accuracy guarantees of the algorithm follow immediately from those of Space Saving. 



Coverage requirements are satisfied by combining Theorem 4.1 with the Accuracy guarantees. 

Our two-dimensional algorithm performs each insert operation in 0{H log time under arbi- 
trary updates, and 0(H) time under unitary updates, just as in the one-dimensional case. Although 
the output operation is considerably more expensive in the multi-dimensional case, experimental 
results indicate that this operation is not prohibitive in practice (see Section [6]) . 

OuTPUTHHH2D(threshold (/>) 

1 P = 

2 for level l=L downto 

3 for each item p at level I 

4 Let n be the lattice node that p belongs to 

5 (/mm(p),/max(p))=GetEstimateSS(S'S'(n), p) 

6 P; = /max(p) 

7 Hp = {/leP such that $h' £P : h -< h' -< p} 

8 for each h e Hp 

9 ~ F^p fm'm{f^) 

10 for each pair of distinct elements h, h' in Hp 

11 q = glh{h,h') 

12 if $h3 / h, h' in Hp s.t. q^h^, 

13 F^ = F'^ + /^ax(g) 

14 if > cpN 

15 P = PU{p} 

16 print (p, /min(p), /max(p)) 



Using Theorem 4.1, we obtain a non-trivial upper bound on the number of HHHs output by 



our two-dimensional algorithm. The proof is in Appendix [Aj 

Theorem 4.2. Let A = 1 + min(/ii, /i2), where hi is the depth of dimension i of the lattice. For 
small enough e, the number of approximate HHHs output by our two-dimensional algorithm is at 
most 

j- (0 - (1 + A)e - V(<A - (1 + A)e)^ - A^e] 



The error guarantee obtained from Theorem 4.2 appears messy, but yields useful bounds in 



many realistic settings. For example, for IP addresses at byte-wise granularity, A = 5. Plugging 
in ^ = .1, e = 10~^ yields |P| < 53, which is very close to the maximum number of exact HHHs: 
A/(f) = 50. As further examples, setting cp = .05 and e = 10~^ yields a bound of |P| < 102, and 
setting (j) = .01 and e = 10~^ yields a bo und of |P| < 536, both of which are reasonably close to 
4. Of course, the bound of Theorem 



worst-case guarantee on output size. 



4.2 



should not be viewed as tight in practice, but rather as a 



Higher Dimensions. In higher dimensions, we can again keep one instance of Space Saving at 
each node of the hierarchy to compute estimates fmm{p) and /max(p) of the unconditioned count of 
each prefix p. We need only modify the Output procedure to conservatively estimate the conditioned 
count of each prefix. 
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We can show that the natural generaUzation of Theorem 4.1 does not hold in three dimensions. 
However, we can compute estimated conditioned sublattice counts as 



h€Hp {heHp,h'eHp)/\q=glh{h,h') 

Inclusion-exclusion implies that, in any dimension, Fp < Fp, and hence by outputting p Fp > (j)N 
we can satisfy Coverage. 

5 Extensions 

Our algorithms are easily adopted to distributed or parallel settings, and can be efficiently imple- 
mented in commodity hardware such as ternary content addressable memories. 

Distributed Implementation. In many practical scenarios a data stream is distributed across 
several locations rather than localized at a central node (see, e.g., |161 121|). For example, multiple 
sensors may be distributed across a network. We extend our algorithms to this setting. 

Multiple independent instances of Space Saving can be merged to obtain a single summary of 
the concatenation of the distributed data streams with only a constant factor loss in accuracy, as 
shown in [2]. We use this form of their result: 

Theorem 5.1. (12, Theorem 11], simplified statement): Given summaries of k distributed data 
streams produced by k instances of Space Saving each with ^ counters, a summary of the concate- 
nated stream can be obtained such that the error in any estimated frequency is at most 2>eN , where 
N is the length of the concatenated stream. 

To handle k distributed data streams, we may simply run one instance of our algorithm inde- 
pendently on each stream (with | counters each), and afterward, for each node in the lattice, merge 
all k corresponding instances of Space Saving into a single instance. After the merge, we have a 
single instance of Space Saving for each node in the lattice that has essentially the same error 
guarantees (up to a small constant factor) as a centralized implementation. Our output procedure 
is exactly as in the centralized implementation. 

Parallel Implementation. In all of our algorithms, the update operation involves updating a 
number of independent Space Saving instances. It is therefore trivial to parallelize this algorithm. 
We have parallelized this algorithm using OpenMP. Our limited experiments show essentially linear 
speedup, up to the point where we reach the limitation of the shared memory constraint. 

TCAM Implementation. Recently, there has been an effort to develop network algorithms that 
utilize Ternary Content Addressable Memories, or TCAMs, to process streaming queries faster. 
TCAMs are specialized, widely deployed hardware that support constant-time queries for a bit 
vector within a database of ternary vectors, where every bit position represents 0, 1 or *. The * is 
a wild card bit matching either a or a 1 . In any query, if there is one or more match, the address 
of the highest-priority match is returned. Previous work describes a TCAM-based implementation 
of Space Saving for unitary updates, and shows experimentally that it is several times faster than 
software solutions |T]. 

Since our algorithms require the maintenance of H independent instances of Space Saving, it 
is easy to see that our algorithms can be implemented given access to H separate TCAMs, each 
requiring just a few KBs of memory. With more effort, we can devise implementations of our 
algorithms that use just a single commodity TCAM. Commodity TCAMs can store hundreds of 
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thousands or millions of data entries [Tj, and therefore a single TCAM can store tens of instances 
of Space Saving even when e = .0001. 

Our simplest TCAM-based implementation takes advantage of the fact that TCAMs have extra 
bits. A typical TCAM has a width of 144 symbols allotted for each entry in the database, and this 
typically leaves several dozen unused symbols for each entry. The implementation of Space Saving 
of [IJ uses extra bits to store frequencies, but we can use additional unused bits to identify the 
instance of Space Saving associated with each item in the database. 

For illustration, consider the one-dimensional byte-wise IP hierarchy. We associate two extra 
bits with each entry in the database: 00 will correspond to the top-most level of the hierarchy, 01 
to the second level, 10 to the third, and 11 to the fourth. Then we treat each IP address a.b.c.d 
as four separate searches: a.b.c.d.OO, a.b.c.*.01, a.b. * .*.10, and o. * . * .*.ll, thereby updating each 
ancestor of a.b.c.d in turn. The TCAM needs to store the smallest counter for each of the four 
Space Saving instances, and otherwise the TCAM-based implementation from p] is easily modified 
to handle multiple instances of Space Saving on a single TCAM. 

Alternatively, we could compute approximate unconditioned counts by keeping a single instance 
of Space Saving with (item, mask) pairs as keys, rather than H separate instances of Space Saving. 
It is clear that this approach still satisfies the Accuracy guarantees for each prefix, and has the 
advantage of only having to store the smallest counter for a single instance of Space Saving. 

Sliding Window^s and Streams with Deletions. Our algorithms as described only work for 
insert-only streams, due to our choice of Space Saving as our heavy hitter algorithm. However, 
the accuracy and coverage guarantees of our HHH algorithms still hold even if we replace Space 
Saving with other heavy hitter algorithms. This is because our proofs of accuracy and coverage 
applied the inclusion-exclusion principle to express conditioned counts in terms of unconditioned 
counts, and then used the fact that our heavy hitter algorithm provides accurate estimates on the 
unconditioned counts; this analysis is independent of the heavy hitter algorithm used. Hence we 
can extend our results to additional scenarios by using other algorithms. 

For example, it may be desirable to compute HHHs over only a sliding window of the last n 
items seen in the stream. jl4j presents a deterministic algorithm for computing e-approximate 
heavy hitters over sliding windows using 0(l/e) space. Thus, by replacing Space Saving with this 
algorithm, we obtain an algorithm that computes approximate HHHs over sliding windows using 
space 0{H/e), which asymptotically matches the space usage of our algorithm. However, it appears 
this algorithm is markedly slower and less space-efficient in practice. 

Similarly, many sketch-based heavy hitter algorithms such as that of [9J can compute e-approximate 
heavy hitters, even in the presence of deletions, using space log A^). By replacing Space Saving 
with such a sketch-based algorithm, we obtain a HHH algorithm using space 0(f logAT) that can 
handle streams with deletions. (As noted previously, this variation was mentioned in [8|.) 

6 Experimental Results 

We have implemented two versions of our algorithm in C and tested it using GCC version 4.1.2 on 
a host with four single-core 64-bit AMD Opteron 850 processors each running at 2.4GHz with a 
1MB cache and 8GB of shared memory. The first version - termed hhh below - uses a heap-based 
implementation of Space Saving that can handle arbitrary updates, while the second version - 
termed unitary below - uses the Stream Summary data structure and can only handle unitary 
updates. Both versions use an off-the-shelf implementation from [1] for Space Saving; further 
optimizations, as well as different tradeoffs between time and space, may be possible by modifying 
the off-the-shelf implementation. We have used a real packet trace from www.caida.org [3] in all 
experiments below. (We have tried other traces to confirm that these results are demonstrative. 
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Note that all of our graphs are in color and may not display well in grayscale.) Throughout our 
experiments, all algorithms define the frequency of an IP address or an IP address pair to be the 
number of packets associated with that item, as opposed to the number of bytes of raw data. This 
ensures that all algorithms (including unitary) process exactly the same updates. Consequently, 
the stream length N in all of our experiments refers to the number of packets in the stream (i.e. 
the prefix of the packet trace [3] that we used). 

We tested our algorithms at both byte- wise and bit-wise granularities in one and two dimensions. 
Bit-wise hierarchies are more expensive to handle, as H, the number of nodes in the lattice structure 
implied by the hierarchy, becomes much larger. However, it may be useful to track approximate 
HHIIs at bit- wise granularity in many realistic situations. For example, a single entity might control 
a subnet of IP addresses spanning just a few bits rather than an entire byte. However, we observed 
similar (relative) behavior between all algorithms at both bit-wise and byte-wise granularity, and 
thus we display results only for byte-wise hierarchies for succinctness. 

For comparison we also implemented the full and partial ancestry algorithms from [Q, labeled 
full and partial respectively. We compare the algorithms' performance in several respects: 
time and memory usage, the size of the output set, and the accuracy of the unconditioned count 
estimates. Our algorithm performs at least as well as the other two in terms of output size and 
accuracy. Except for extremely small values of e (less than about .0001), which correspond to 
extremely high accuracy guarantees, our two-dimensional algorithm is also significantly faster (more 
than three times faster for some parameter settings of high practical interest). Our one-dimensional 
algorithm is also faster than its competitors for values of e greater than about .01, and competitive 
across all values of e. Our algorithm uses slightly more memory than its competitors. Below, we 
discuss each aspect separately. 

In summary, our one-dimensional algorithm is competitive in practice with existing solutions, 
and possesses other desirable properties that existing solutions lack, such as improved simplicity and 
ease of implementation, improved worst-case guarantees, and the ability to preallocate space even 
without knowledge of the stream length. Our two-dimensional algorithm possesses all of the same 
desirable properties, and is also significantly faster than existing solutions for parameter values of 
primary practical interest. The primary disadvantage of our algorithms is slightly increased space 
usage. 

All of our implementations are available online at |19j . 

Memory. Both versions of our algorithm use more memory than full and partial. The difference 
between hhh, partial, and full is a small constant factor; unitary uses about twice as much space 



as hhh. The largest difference in space usage appears in one dimension, as shown in Figure 4a 



The difference is much smaller in two dimensions, as shown in Figure 4b In both cases, the better 
space usage of partial comes at the cost of significantly decreased accuracy and increased output 
size, as discussed below. We conclude that in situations where the decreased accuracy of partial 
cannot be tolerated, the memory usage of our algorithms is not a major disadvantage, as hhh and 
full have similar memory requirements, especially in two dimensions. 

Ideally, we would be able to present our results with the independent variable a programmer- 
level object such as memory usage, rather than the error-parameter e. In practice, a programmer 
may be allowed at most 1 MB of space to deploy an HHH algorithm on a network sensor, and have 
to optimize speed and accuracy subject to this constraint. But while the mapping between e and 
memory usage is straightforward for our algorithm (the Space Saving implementation uses 36 bytes 
per counter, and we use a fixed ^ counters), this mapping is less clear for partial and full, as 
their space usage is data dependent, with counters added and pruned over the course of execution. 
Figures 4c and 4d show the empirical mapping between space usage and e for a fixed stream length 
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ry partial 



;vte- granularity in two dimensions with ^=0.01 and 0=0.02 



/te-granularity in one dimension with 




(a) Maximum memory usage in one (b) Maximum memory usage in two (c) Memory usage in one dimension 

dimension over all stream lengths N. dimensions over all stream lengths A*', for fixed stream length. 

For hhh and unitary, space usage For hhh and unitary, space usage 

does not depend on A'^; space usage does not depend on A''; space usage 

only varied with A*' for partial and only varied with A'' for partial and 

full. full. 
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(d) Memory usage in two dimensions (e) Speed comparison in two dimen- (f) Speed comparison in two dimen- 
for fixed stream length. sions with high e. sions with medium e. 
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(g) Speed comparison in two dimen- (h) Speed comparison in one dimen- (i) Speed comparison in one dimen- 
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(j) Speed comparison in one dimen- (k) Speed comparison in one dimen- (1) Speed comparison in two dimen- 
sion with low e. sion for fixed stream length. sions for fixed stream length. 

Figure 4: Memory and speed comparisons. 
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Bvte-granularity in one dimension with and ii>=o.Q2 



Byte-granularity in one dimension with E = n.noi and i^=n.lM2 Byte-granuiarity in two dimensions with ^=0.001 and o = n.002 





(a) Output size comparison. For each (b) Accuracy comparison in one di- (c) Accuracy comparison in two di- 
algorithm, we display the maximum mension. mensions. 
output size over all streams tested 
(for the given setting of e and 0). 

Figure 5: Accuracy and output size comparisons. 



of = 30 million with one- and two-dimensional bytewise hierarchies. This setting highlights the 
importance of our improved worst-case space bounds, even though our algorithm uses slightly more 
space in practice. It can be imperative to guarantee assigned memory will not be exceeded, and 
our algorithm allows a more aggressive choice of error parameter while maintaining a worst-case 
guarantee. 

We emphasize that we did not attempt to optimize memory usage for our algorithms using 
characteristics of the data, as suggested in Theorem 3.2 It is therefore likely that our algorithms 
can function with less memory than partial and full in many practical settings. 

Note that the running time and memory usage are independent of (j), as (j) only affects the output 
stage, which we have not included in our measurements as the resources consumed by this stage 
were negligible. 

Time. We observe that in both one and two dimensions, both unitary and hhh are faster than 
partial and full except for extremely small values of e. The speed of each algorithm for each 
setting of e is illustrated for a fixed stream length of A = 30 million in Figures 4k and |4Tj our 
algorithms are fastest in one dimension for e greater than about .01 and in two dimensions for e 
greater than about .0001. 

We show how runtime grows with stream length for fixed values of e in Figures 4e ^ For 
concreteness, on a stream with N = 250 million, unitary processes about 2.2 million updates 
per second in one dimension at a byte- wise granularity when e = .1, while hhh processes 1.85 
million, partial 1.3 million, and full processes 1.4 million. Here, N corresponds to the number of 
packets (not weighted by size), and the updates per second statistic specifies the number of packets 
processed per second by our implementation. In two dimensions for e = .1, unitary processes over 
370, 000 updates per second and hhh processes 300, 000, while partial processes 71, 000, and full 
processes about 100, 000. Thus, our algorithms ran more than three times faster than partial and 
full for this particular setting of parameters. 

Output Size. The output size and accuracy are measures of the quality of output and depend on 
the value of (p. All three algorithms produce a near-optimal output size, with partial consistently 



outputting the largest sets. The largest difference observed is shown in Figure 5a 



Accuracy. We define the relative error of the output to be maxpgoutput . Clearly, 

the relative error is between and 1, because of the accuracy guarantees of our algorithms. We 
find that the relative error can vary significantly for all algorithms, but our algorithm uniformly 
performs best. The relative error of the partial ancestry algorithm is often close to the theoretical 
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upper bound of 1, making it by far the least accurate of the algorithms tested. 

TCAM Simulations. We simulated our TCAM-conscious implementation of our algorithm on the 
same packet traces as above, in order to estimate the number of TCAM operations our implementa- 
tion requires per packet processed. experimentally demonstrates that TCAM READ, WRITE, 
and SEARCH operations all take roughly the same amount of time. Thus, we counted the total 
number of READ, WRITE, and SEARCH operations our TCAM implementation required, with- 
out distinguishing between the three. We found that for one-dimensional IP addresses at byte-wise 
granularity, each packet required about 14 TCAM operations on average, or 2.8 TCAM operations 
per instance of Space Saving maintained by our algorithm. This is slightly better than the worst- 
case behavior of the implementation of [1], which requires up to 4 TCAM operations per update. 
Our two-dimensional algorithm at byte-wise granularity requires about 65 TCAM operations per 
packet; since the two-dimensional algorithm maintains 25 instances of Space Saving, this translates 
to only 2.6 TCAM operations per instance of Space Saving. We attribute this improvement in 
TCAM operations per instance of Space Saving to the fact that the frequency distribution at high 
nodes in the two-dimensional lattice is highly non-uniform. 

7 Conclusion 

The trend in the literature on the approximate HHH problem has been towards increasingly com- 
plicated algorithms. In this work, we present what is perhaps the simplest algorithm for HHHs in 
arbitrary dimension, and demonstrate that it is superior to the existing standard in many respects, 
and competitive in all others. We believe our algorithm offers the best tradeoff between simplicity 
and performance. 
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A Proof of Theorems 



Proof of Theorem 4-1 ■ First note that it is possible, using the inclusion-exclusion principle, to show 
that 

F, = f{p) - /(^) + E f^i) 

h£Hp {h,h'£Hp)/\q=gHh,h') 

E /('^) + --- 

ih,h' ,h"eHp)Aq=glh{h,h',h"') 

We claim that for all u expressible as the greatest lower bound of more than two elements of 
Hp, the total contribution of f{u) to the above sum is 0. Indeed, suppose that u = (ui,U2) is a 
descendant from exactly m such elements, hi, /12, • • . , hm in Hp. Since u ^ ha for a in {1, . . . , m} 
these m heavy hitter elements can be written as hi = (Pj^-ui, Pj-^U2), /i2 = (-Pi2^i) Pj2'^2)-, • • • , hm = 
{Pi^ui, Pj^U2), where {PiUi, PjU2) denotes the element obtained from {ui,U2) by generalizing i 
times on the first dimension and j times on the second dimension. Renumbering if necessary, 
assume the nodes are sorted on the generality of their first component so that zx < ^2 ^ ■ ■ ' ^ ^m- 
It is clear that there are no equalities in the sequence because ii ia = ip then either ha ^ hp oi 
hp ^ ha which contradicts that these are from Hp. When the corresponding relationships between 
the second components are examined it can be seen that the increasing sort on the first component 
forces a decreasing order on the second component ji > j2 > • • • > jm (since if a < /3 and ja < jp 
then because ia < ip the contradiction ha -< hp is reached). Thus the m elements are in a linear 
structure with endpoints hi and hm- Clearly the first component of u is the first component of hi 
and similarly the second component of u is the second component of hm- With this in hand it is 
clear that u is the greatest lower bound of any subgroup of the m elements that includes hi and 
hm- There are C"^^) ways to pick k middle terms and thus there are (™^^) ways in which node u 
appears as the greatest lower bound of A; -|- 2 elements from Hp. Returning to the sum 

Fp = f{p) - Yl /(^) + E /(^) 

h&Hp {h,h'£Hp)/\q=gHh,h') 

E /('^) + --- 

ih,h' ,h"£Hp)Aq=glh{h,h',h"') 
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it is now clear that f{u) will appear once in the sum over pairs, m — 2 times in the sum over triples, 
and, in general, C"^^) times in the sum over groups of size k + 2. When combined with the sign 
structure in the sum this gives a resulting contribution from u of 

/mE(-i)V7 =/H(i-ir-' = o. 

i=o V J / 
Thus in the two dimensional case 



Fp = f{p) - E /(^i) + E 

as claimed. □ 



Proof of Theorem 4-2. The proof will closely parallel that of Theorem 3.4 We bound the total 
error in the estimated conditioned counts, aggregated over all p G P, and this will imply that 
the sum of the true conditioned counts of all p £ P is large. Hence there cannot be too many 
approximate HHHs output. 

We showed in Theorem 14. II that 

Fp = f{p) - E /(^i) + E 

hi &Hp q&Tp 



Therefore, 



F;-Fp={f^M-fiP))+ E (/(^l) - /min(/ll)) 

hieHp 



+ E • 

Our goal is to show that the sum of the true conditioned counts of all p G P is large by bounding 
the total error in the estimated conditioned counts, aggregated over all p G P. To this end, consider 
the sum 

Y.^p = Y.K-E(K-F'p)>\pm-Y,{F;-F,) 

p&p p£P pep p£P 

= |P|0iV-E(/max(p)-/(p))- 

peP 

E E (/(^i) - /--(^i)) -EE - 

peP hieHp pePqeTp 

We refer to the second term on the right hand side of the last expression, J2p&p (/max(p) — /(p)) , 
as "Term- Two error" , the third term, YlpepJ2hieHp (/(^i) ~/mm(/'-i)) i as "Term-Three" error, and 
the fourth term, X^^gp SgeT (/max('?) — fil)) as "Term-Four error". By the Accuracy guarantees, 
it is immediate that the Term-Two error is bounded above by |P|eA^. 

In order to bound Term-Three error, we must briefly introduce the notion of comparable items 
in a lattice. Two elements x and y are comparable under the ^ relation if the label of y is less 
than or equal to that of x on every attribute. Let A be the size of the largest antichain in the 
lattice, that is, the maximum size of any subset of prefixes such that any two items in the subset are 
incomparable. It was shown in [8] that A = 1 + min(/ii, /i2)- We show that X^pgp \Hp\ < ^|P|; it 
then follows by the Accuracy guarantees that the Term-Three error is bounded above by \P\AeN. 



21 



To this end, for any h £ P, consider the set Bfi = {p €z P : h £ Hp}. We claim that \Bh\ < A, 
since all the items in Bh must be incomparable. Indeed, suppose p,q £ Bh and the label of q is less 
than the label of p on both attributes. Then /i ^ g -< p, so by definition of f/p, h f/p, which is a 
contradiction. Thus, X^pgp |-f^p| = X]/i6P l-^/il ^ ^l-f I- 

Finally, we may bound the Term- Four error by ^^eN. This will clearly follow from the 

Accuracy guarantees if we can bound YlpeP I-^pI \ ■ ^^^^ ^"^^^ each p G P let Gp be 
a graph on \Hp\ vertices, where edge (/ii,/i2) £ E{Gp) if and only if glb(/ii,/i2) G Tp. It is clear 
that \Tp\ = \E{G)\. We claim G is a triangle-free graph - it then follows by Turan's theorem [24j 

that \Tp\ < I*' . For three distinct vertices /ii,/i2,/i3 G Hp, let wi = glb(/i.i, /12), 1^2 = glb(/i2,/i3) 
and tt3 = glb(/ii, /13). We show that if ui, U2 and U3 are all in Tp, then for at least one i, Ui is a 
descendant of /ii,/i2, and /13, contradicting -Uj G Tp. 

Write hi = /ii,2) for each i G {1, 2, 3}. By assumption (/ij, hj) share a common descendant 
for any pair so we may assume (renumbering if necessary) that hi^i -< /i2,i -< ^3,1 as one- 



dimensional objects. The remainder of the proof now closely parallels that of Theorem 4.1 It is 
clear that there are no equalities in the sequence because if hi^i = hj^i then either hi^i ^ hj^i or 
hj^i ^ which contradicts that these are from Hp. For the same reason, it can be seen that 
the increasing sort on the first component forces a decreasing order on the second component, i.e., 
^3,2 h2,2 -< ^1,2- Consequently, U3 = glh{hi,h^) = /i3,2) is a descendant of /ii,/i2, and h^,, 
contradicting € Tp. 

So we have shown that \Tp\ < ' ^' . Since X^peP \Hp\ < and trivially \Hp\ < \P\ for all 

pe P, Holder's Inequality implies that Y^peP l^pl ^ EpeP ^¥ ^ A\P\'^/4. 
Thus, we have shown that 

ZZ^P- l^l'^^ ~ l-^l^^ ~ A\P\eN j^e^- 

peP 

Now note that AN > ^pgp Fp, because each fully-specified item e can only contribute to the 
true conditioned counts of incomparable approximate HHHs. For if e contributes to the conditioned 
count of both p and q then e <p /\e ^ Pp and e <ql\e -^Pq. \ip and q are comparable, then this 
implies either q ^ p or p ^ q, contradicting the fact that e -/^ Pp and e Pq. Thus, we see that 

AN > (07V -{A + l)eiV)|P| - ^^eA^. 

Dividing through by A'^ and subtracting A from both sides yields 

> -A + (0 - (A + l)e)|P| - ^|Pp. 

Using the quadratic equation, this holds if and only if 

, (1 + - (/> + y^icP - (1 + A)e)2 - Ah 



or 



|P| < -2 
|P| > -2 



Ae 

{l + A)e-^- ^{(l)-{l + A)ey -Ah 



Ae 

and we can rule out the latter case for small enough e via trivial upper bounds on |P| such as 
IPI < □ 
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